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ABSTRACT 

Starting in winter 2008/2009 an L-band 7-Feed- Array receiver is used for a 21-cm line survey 
performed with the 100-m telescope, the Effelsberg-Bonn Hi survey (EBHIS). The EBHIS will 
cover the whole northern hemisphere for decl.> —5° comprising both the galactic and extra- 
galactic sky out to a distance of about 230 Mpc. Using state-of-the-art FPGA-based digital fast 
Fourier transform spectrometers, superior in dynamic range and temporal resolution to conven- 
tional correlators, allows us to apply sophisticated radio frequency interference (RFI) mitigation 
schemes. 

In this paper, the EBHIS data reduction package and first results are presented. The reduc- 
tion software consists of RFI detection schemes, flux and gain-curve calibration, stray-radiation 
removal, baseline fitting, and finally the gridding to produce data cubes. The whole software 
chain is successfully tested using multi-feed data toward many smaller test fields (1-100 deg 2 ) 
and recently applied for the first time to data of two large sky areas, each covering about 2000 
deg 2 . The first large area is toward the northern galactic pole and the second one toward the 
northern tip of the Magellanic Leading Arm. Here, we demonstrate the data quality of EBHIS 
Milky Way data and give a first impression on the first data release in 2011. 

Subject headings: methods: data analysis — techniques: spectroscopic 



1. Introduction 

Blind Hi surveys allow us to explore a vari- 
ety of objects. The Effelsberg-Bonn Hi Survey 
(EBHIS) covers not only the Milky Way but also 
the local universe out to a redshift of 0.07. Ac- 
cordingly, EBHIS will serve as a major database 
for many scientific questions, addressing, e.g., the 
structure and mass distribution of the Milky Way, 
the size distribution of halo clouds, or the struc- 
ture formation of the local universe. A complete 
description of the scientific aims will be given in a 
forthcoming paper. Here, the technical setup and 
the data reduction software as used for the survey 
are presented. 

Today the most comprehensive database for 
galactic H I sci ence is the Leiden/Ar gentine/Bonn 
survey (LAB; iKalberla et""all I2005J) the first all- 
sky survey corrected for stray radiation. Toward 
high galactic latitudes, the faint Milky Way emis- 



sion from the direction of interest can be severely 
degraded by radiation of the Milky Way disk en- 
tering the receiver system via the near and far 
side lobes. SR correction is crucial for quanti- 
tative analyses of most of the galactic sky using 
conventional single dish optics. The Parkes and 
Arecibo telescopes perform multiple consecutive 
surveys to observe the galactic and extragalactic 
sky. The data acquisitio n of the Parkes Galactic 



All S ky Su rvey (GASS; iMcClure- Griffiths et al 



200a [2009J) is already co mpleted and the fina l 



data products are released (Kalberl a et al.l 201 
The Galactic AL FA (GALEA; iGoldsmithl 1200 
Heiles et aJ J 12004) is still in progress. In the ex- 



tragalactic reghne_thfiIIj_F^rkej3_A^ 



(HIPASS; iBarnes et al.l l200ll : iMever et al.l [2004) 



mapped the complete southern hemisphere detect- 
ing more than 5000 galaxies providing a valuable 
database to infer the H 1 properties of galaxies in 
the local universe. The Arecibo Legacy Fast ALFA 



survey (ALFALFA: iGiovanelli et alj2005l) is ongo- 
ing and will result in deeper data at higher angular 
resolution though it is limited to a smaller area of 
about 7000 deg 2 . 

Since the beginning of the year 2009 an L- 
band 7-Feed- Array receiver system is available at 
the 100-m telescope in Effelsberg for astronomi- 
cal measurements. We use this instrument to per- 
form a fully sampled Hi survey of the northern 
hemisphere for decl. > —5°, observing in parallel 
the galactic and extragalactic sky. Multiple cov- 
erages (at different hour angles and seasons) will 
allow us to disentangle radio frequency interfer- 
ence (RFI), SR, and baseline problems. Redun- 
dancy allows us to remove instrumental biases sig- 
nificantly The survey area is subdivided into two 
sky areas, o ne toward the Sloan Di gital Sky Sur- 
vey (SPSS: lAdelman-McCarthv et al.l 120081 ) area 
where an integration time of 10 minutes is pro- 
jected while the remaining sky will be integrated 
for 2 minutes. This yields a full-sky survey supe- 
rior in sensitivity, angular sampling, and resolu- 
tion to any previo usly performed large Hi survey 
(JKalberla fc Kerpll2009f l . The data of the first full 
sky coverage will be released in 2011. 

The data analysis of a large-area H I survey is 
undoubtedly a major challenge. Modern receiving 
systems offer scientifically usable bandwidths of 
several tens to hundreds of MHz to study the H I at 
moderate red shifts. Digital high-dynamic-range 
spectrometers bas ed on Field Programmable Gate 
Arrays (FPGAs; Istanko et all 120051: Klein et al ' 



2006) allow us to store spectra on time scales of 
less than 1 s which is essential to apply sophis- 



ticate d RFI mitigation procedures (jWinkel et al 
2007). The receiving system as used for EBHIS 



provides a bandwidth of 100 MHz which covers the 
redshift range out to z ~ 0.07 while the 16384 
spectral channels yield a velocity resolution of 
1.25 km s~ . Spectral dumps are recorded every 
500 ms. As a drawback though, one has to deal 
with very high data rates. For the EBHIS, typical 
values are about 5GBhr~ x , making the calibra- 
tion and analysis of EBHIS data a task which is 
impossible to accomplish manually. Consequently, 
the data reduction algorithms and software de- 
veloped were optimized with respect to computa- 
tional efficiency and performance. 

The paper is organized as follows. In Section[2j 
the receiving system and technical setup are de- 



scribed. During various test measurements the 
quality of the new receiving system was investi- 
gated, the results of which are discussed in Sec- 
tion[3] Our data reduction package which was de- 
veloped for the EBHIS is presented in detail in 
Section|4]and we show its successful application to 
an example data set in Section[5] A summary is 
given in Section[6l 

2. Technical setup 
2.1. Receiver 

The 21-cm multi-feed receiver is a cooled single 
conversion heterodyne system offering 14 separate 
receiving channels. The electromagnetic waves are 
focused by the antenna and coupled via the feed 
horns into waveguides placed in a cryogenic de- 
war which is situated in the primary focus of the 
telescope. The central feed is sensitive for circu- 
lar polarization while the off-set feeds receive only 
linear polarized signals. According to this setup, 
the offset feeds are expected to be more sensitive 
for a broader variety of RFI signals. An initial 
amplification of 40 dB is applied to the spectral 
band between 1200 and 1700 MHz. A lot of effort 
went into arranging the beams providing best pos- 
sible beam efficiency while minimizing inter-beam 
coupling. Before down-conversion to the interme- 
diate frequency (IF) using a local oscillator (LO) 
further filtering is applied, limiting the bandwidth 
to the range 1290-1430 MHz. The IF band lies be- 
tween 80 MHz and 220 MHz (the center frequency 
is 150 MHz). The receiver utilizes a noise diode 
at constant temperature to monitor relative mod- 
ulations of the system temperature. The beam 
separation is 15', or 1.6 beam widths, placed on a 
hexagonal grid. In Figure[TJ the measured beam 
pattern for the multi-feed array is sh own. A more 
detail ed description can be found in Keller et al.l 
(|200fih . 



2.2. Backend 

For the EBHIS a fast Fourier transform (FFT) 
spectrometer developed at the Max-Planck- Institut 
fur Radioastronomie (MPIfR) is used. It is based 
on a Xilinx Virtex-4 FPGA, fed by an analog- 
digital converter (ADC) that has a sampling rate 
up to 3 GHz at 8 bit dynamics. The original de- 
sign was introduced fo r molecular spectr oscopy at 
the APEX telescope (JKlein et al J 12006) . It was 
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Fig. 1. — Antenna pattern of the seven-beam re- 
ceiver as measured by E. Fiirst (MPIfR) using the 
continuum backend. The contour lines mark the 
sensitivity distribution (in dB) of the receiver sys- 
tem as a function of angular distance from the 
source. The cross-like structure is caused by the 
feed-support legs of the primary focus cabin. Ob- 
viously, the off-set feeds are affected by coma, lead- 
ing to the obvious elongation of the sensitivity pat- 
tern. 



adapted to match the desired specifications for an 
Hi survey. Since 2008 August seven backends, 
each equipped with two spectral channels, are in- 
stalled. They utilize a total bandwidth of 100 MHz 
and 16k spectral channels, providing an effective 
frequency resolution of 7.1kHz (equi valent noise 



bandwidth, ENBW; iKlein et alJl200J) . yielding a 



velocity resolution of 1.45 km s , which is only 
marginally larger than the channel separation of 
6.1kHz. Every 500 ms an integrated spectrum 
(abbreviated henceforward as "spectral dump") is 
transmitted via Ethernet to a server, which stores 
the raw data (in binary file format) in the disk. 

2.3. Survey strategy 

The survey will be carried out in three major 
steps. The first part, the testing phase, is al- 
ready finished. Here, we mapped smaller portions 
of the sky to test the instrument and software. 
Currently, the complete northern hemisphere is 
observed with an effective integration time of 
2 minutes per position. Finally, the SDSS area 
will be observed via multiple coverages yielding 
the finally aimed integration time of 10 minutes. 
5x5 degree fields are measured with on-the-fly 



R.A.-Decl. scanning using a tangential plane pro- 
jection which will lead to a homogeneous noise 
distribution all over the sky. The hexagonal feed 
pattern is rotated by 19deg with respect to the 
scanning direction such that the sampled scanlincs 
(denoted as subscans) are equidistantly spaced. 
To keep this feed angle fixed in the R.A.-Decl. 
system, the dewar must be rotated according to 
the parallactic angle. 

3. Receiver quality 

Several test measurements have been con- 
ducted, not only to test the new multi-beam 
receiver and the FFT spectrometers but also 
to check the quality of the data reduction soft- 
ware. The first test measurements (2007 Novem- 
ber 20/21 and 23/24) were intended to investigate 
the overall system performance by carrying out 
Allan tests, measurements of the system tempera- 
ture, and bandpass stability. During these obser- 
vations a strong RFI signal produced by terrestrial 
digital radio (DAB) at a frequency of 1450 MHz 
was observed (located in frequency outside of the 
bandpass filter). This strong terrestrial irradia- 
tion caused highly variable baselines for the cen- 
tral (circularly polarized) feed. As a consequence, 
new stop-band filters were installed, having much 
higher sideband suppression. After that we iden- 
tified a significant feed resonance producing a 
strong broad Gaussian-like signal occurring at a 
frequency of about 1435 MHz. To minimize its 
influence the LO setup was optimized accordingly. 
The emission of the Milky Way is detected close 
to the high-frequency limit of the bandpass edge. 

It is very important to show that the noise is 
free from systematic effects. For an ideal radio 
receiver the radiometer equation is applicable 



P(t) 



1 



Vt^f 



(1) 



stating that the noise power P{i) decreases with 
the square root of the integration time t if a con- 
stant bandwidth A/ is used. Practically, each sys- 
tem suffers from instabilities on a certain timescale 
£a, yielding a divergence from the theoretical be- 
havior. It is obvious that a system should be de- 
signed in a way to maximize £a- One possibility 
of determining t\ for the whole receiving system 
is to compute an Allan plot. Several thousands of 
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Fig. 2. — Allan plot calculated for spectral data, 
as observed during test measurements in 2007 
November. The plot shows the noise behavior of 
one of the offset feeds. Each spectral dump is inte- 
grated for 450 ms (separated in time by 2 s). The 
plot reveals very good receiver stability up to the 
total integration time of about 1000 s. 



spectral dumps were recorded and subsequently 
integrated (e.g., in steps of 2") to evaluate the 
noise behavior on the timescale t n . In Figurc[2] 
an Allan plot for one of the offset feeds is shown. 
The data show the expected behavior up to an in- 
tegration time of about 1000 s, which is sufficient 
for the deepest observations (600 s) we are aiming 
for. 

In Figure[3] (lower panel) , the system tempera- 
tures for the central feed are shown as derived from 
observations of the standard calibration source S 7. 
Data for three different observational periods are 
plotted separately. Figure|3] shows two main ef- 
fects; as expected T sys is strongly elevation depen- 
dent, i.e., lower elevations lead to higher irradi- 
ation from the ground (Figure|3l middle panel), 
and seasonal effects are apparent. The maximal 
system temperature is higher during summer than 
during the winter term. The remaining scatter can 
be attributed to local weather conditions and the 
averaging of the data over a longer period (about 
a month). The system temperature of the cen- 
tral feed ranges typically between 22 and 35 K. 
The corresponding temperatures of the offset feeds 
are marginally higher, ranging between 24 K and 
39 K. Note, that the apparently high system tem- 
peratures in some of the measurements can be at- 
tributed entirely to the low elevation of the cali- 
bration source. 
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Fig. 3. — System temperatures obtained using the 
IAU standard calibration source S 7 for three dif- 
ferent observational periods. In the upper and 
middle panels, the horizontal coordinates of S7 
are plotted as a function of time. The lower plot 
contains the resulting system temperatures. 



4. The EBHIS data reduction 

The major reduction tasks are flux calibration, 
gain curve (bandpass) correction, SR correction, 
baseline fitting, and RFI detection. The processed 
spectra are eventually merged by a gridding tool 
into a three-dimensional data cube. Deviating 
from the common pipeline approaches, the EBHIS 
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Fig. 4. — Data reduction scheme used for EBHIS 
is organized in a highly flexible way. Each task 
is optimized to work as independently as possible 
on the individual spectral dumps. The computed 
correction parameters are stored within an SQL 
database to allow an easy query of necessary pa- 
rameters (e.g., calibration factors, RFI flags) from 
other tasks. 



reduction is organized in a highly flexible manner. 
To minimize redundancies, each processing step 
works as independently as possible on the data. To 
achieve this goal, every correction to be applied on 
the data is described by a minimal set of parame- 
ters which is then stored in an SQL database. For 
example, the RFI detection algorithm returns a 
list of spectral channels per spectral dump. Only 
this short list consisting of a few integer entries 
needs to be stored. Every other processing soft- 
ware can then easily query the database to obtain 
these lists and flag contaminated data. Also the 
gain calibration needs only few numbers to store 
per observation session. Tasks, which rely on cal- 
ibrated input spectra, simply read the raw data 
from disk and apply the calibration on-the-fly. 

If the individual modules do not depend on each 
other, this modular approach of the data reduction 
chain enables to update one single module without 
the necessity to re-execute other reduction tasks. 
In that case, different tasks can even work in paral- 
lel. A schematic representation of the EBHIS data 
processing chain is shown in Figure^] The picture 
shows with solid line arrows the flow of the spec- 
tral data, while dashed lines show the information 
exchange with the databases. Every task stores 
calculated values in an associated database table, 



and can query all other databases as necessary. In 
contrast to pipeline approaches, a new task is nec- 
essary, which is denoted as "merger" , to apply all 
the correction terms to generate the final data set. 
After that, the spectra can be gridded to a data 
cube. 

As the integration time per dump is 500 ms, a 
significant amount of data has to be handled for 
the full survey (about 25 Tbyte). We developed 
serial algorithms, to allow processing of large data 
sets without the need to keep them in the main 
memory (RAM), which is especially important for 
the gridding task. Multi-threading techniques are 
extensively used to significantly speed up compu- 
tation on multi-processor/-core platforms. Fur- 
ther improvement on processing speed is because 
of the database approach which allows the simul- 
taneous access from multiple workstations. The 
database also provides the opportunity to eas- 
ily query information, e.g., on sky areas already 
observed (including visualization), and allows to 
backup the database very efficiently. 

A typical work flow is as follows. RFI detection, 
flux, and gain curve calibration are independent, 
they can be processed in parallel. From the cal- 
ibrated spectra, the SR correction is subtracted, 
baselines are computed, and finally the gridder 
computes the data cubes. However, an accurate 
SR correction needs in principle an iterative ap- 
proach, where the resulting data of one iteration 
are fed into an improved SR model to be sub- 
tracted in the next iteration. 

4.1. RFI mitigation 

In radio astronomy, the signals of interest are 
in general polluted by man-made artificial radio 
emission called RFI. RFI is produced by a broad 
variety of emitters. Because of the high sensitiv- 
ity of modern radio telescopes, faintest RFI irra- 
diation can corrupt the observational data. Even 
within the radio astronomical protected bands one 
has to deal with "legal" RFI irradiation in adja- 
cent spectral regimes, leaking part of their radia- 
tion power into the protected band below the ir- 
radiation threshold. Digital circuits, common in 
modern electronic devices, pollute their environ- 
ment with a multitude of harmonics. The dynamic 
range of RFI irradiation covers a huge dynamic 
range from events close to the statistical noise level 
up to strong bursts which can even lead to the sat- 



uration of the whole receiving system. 

Two practical strategies are used today to 
deal with RFI: keeping artificial radiation away 
from the telescope (passive mitigation) or trying 
to detect or even mitigate RFI once it has en- 
tered the receiving system (active mitigation). 
For the latter, several methods have been de- 
veloped so far. First, a manual search for RFI 
signals in the spectra can be carried out (the 
"classical" method). Second, using sophisticated 
algorithms it is possible to search f or RFI sig- 



nals in recorded data automatically (|Bhat et al. 
20051 ). Third, one can use real-time applica- 



tions which have to be implemented into the 
receiver chain of the telescope. Various ap- 
proaches ar e under consideration e.g. , adap- 
tive filters (Bra dley fc Barnbauml Il99q) . post- 
correlators (Briggs et al. 2000), or real-tim e higher 
order statistics (HOS; see lFridmanll2001[ ). 

For EBHIS we follow the second approach as no 
hardware solution is available at the 100-m tele- 
scope. It became ob vious during test observations 
(jWinkel et al.ll2007t ) that the RFI amplitude varia- 
tions at Effelsberg occur either on the order of less 
than 100 ms or the amplitudes are relatively sta- 
ble. To increase the detection rate for the former, 
spectral data at high temporal resolution are de- 
sired. The necessary short integration times yield 
very high data rates and low signal-to-noise spec- 
tra. In practice, one has to find a compromise 
between reasonable time resolution for RFI detec- 
tion and the amount of recorded data. Today, only 
modern FPGA-based FFT spectrometers provide 
the dynamic range and temporal resolution needed 
for off-line RFI detection applications. 

The aim for EBHIS was to follow the ap proach 
presented in detail by IWinkel et al.l (|2007l ) which 
is based on a two-step procedure. 

• Identification of spectral features. Base- 
line effects are minimized by a fitting pro- 
cedure of the data in the time-frequency do- 
main, e.g., using two-dimensional polynomi- 
als. Robustness of the fit is ensured by au- 
tomatically setting "windows" around line 
emission or possible RFI signals. 

• Statistical and morphological considerations 
are applied to distinguish between astronom- 
ical emission and RFI. 



These statistical detection strategies are based on 
the high temporal variability of interference sig- 
nals while the astronomical signal needs to be con- 
sidered as constant over the time interval used for 
the analysis. However, in order to complete a first 
full coverage of the Northern hemisphere we de- 
cided to scan the sky rather fast (at a rate of 
about 3's -1 ) which unfortunately provides only 
few dumps per beam size. Furthermore, some 
of the narrow band interference signals are very 
constant in time. On the other hand, practically 
almost all RFI events are very sharp (1-2 pixels 
wide) in the time-frequency plane. 

Because of that, we chose for the initial data 
reduction a less complex approach based on cross- 
correlation of the spectral data s(x) with a tem- 
plate p{x) (of size n), 

+n/2 

C{x)= Yl (P(i)-p)-(s(x+i)-s). (2) 

i=-n/2 

To best match the impulse-like shapes of RFI we 
use the templates 



p(i) = 




(3) 



They are either applied to spectral data or to the 
time series of each spectral channel. In practice, 
strong RFI peaks have a large impact on s and 
p ■ s(x) producing troughs in the correlation spec- 
trum. To avoid this effect, we set p = and re- 
place the mean estimator s by the median, which 
yields a higher robustness against outliers and 
leads to convincing results (Figure[5]). The cor- 
relation spectrum is searched for values in excess 
of more than 4<r in order to flag bad data points. 

Figure|6]shows two regions of the time-frequency 
plane (upper panels) and the result of the flagging 
algorithm (bottom panels). The left panels con- 
tain not only a continuum source, but also a bright 
galaxy, neither of which produces a false detection. 
The right panels exhibit several broadband RFI 
events as well as lots of narrow-band signals which 
are correctly flagged by the procedure. 

Note that the specific choice of our templates is 
suited to optimize the detection rate for the nar- 
row (in time or frequency) RFI signals which make 
up for almost all events. To further improve the 
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Fig. 5. — Example spectrum (upper panel) and 
the applied cross-correlation with a template 
(lower panel) used for our RFI detection algo- 
rithm. 

detection efficiency we use variable threshold lev- 
els as a function of the number of feeds and polar- 
ization channels containing the same RFI event. 
A more detailed explanation of the algorithms 
and results of simulations the aim of which was 
to quantitatively estimate the RFI de tection effi- 
ciency of our software will be given in iFloer et al. 
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4.2. Flux calibration 

4--2.1. Iterative Gaussian smoothing 

Many of the subsequent data reduction tasks 
rely on the computation of baselines or smoothed 
data for various purposes. Any RFI signals would 
highly distort the usual polynomial fits or Gaus- 
sian smoothing algorithms. Having an RFI flag 
database, contaminated data points can be flagged 
such that the results are not affected by these out- 
liers. However, it is not always guaranteed that all 
interference signals were correctly identified and 
the line emission of the astronomical signals of in- 
terest would still have impact, e.g., on the base- 
line fits. Therefore, a more robust algorithm was 
developed (iterative Gaussian smoothing (IGS)) 
which does not rely on an RFI flag database. It 
works iteratively: (1) low-pass filtering the input 
data using a Gaussian kernel, (2) searching for out- 
liers in the residual (i.e., peaks above a threshold 



of 3a in the residual spectrum, which are flagged), 
and (3) replacing flagged data points in the in- 
put data with either interpolated or prior values 
(if available). After usually a few iterations the 
approximation converges. 

4-2.2. Gain curves 

The multi-beam receiver uses the heterodyne 
principle where the radio frequency (RF) signal 
is mixed with a monochromatic signal of an LO. 
An appropriate low-pass filter applied after this 
operation provides the desired IF signal at much 
lower carrier frequencies. The whole system can 
be described by 



Pif — GifGrf Pa + T sys ] 



(4) 



with Pip being the power in the IF chain. Gif and 
Grf are the frequency-dependent gain functions 
at IF and RF stages, respectively. The gain acts 
on the astronomical signal of interest, Ta, plus the 
contribution T sys incorporating noise due to sky, 
ground radiation (scattering and spillover), and 
thermal noise of the receiver and calibration diode, 
etc. In our case, we also observe a rather strong 
multi-modal sine- wave pattern which is dependent 
on elevation, feed rotation angle, and polarization 
channel. We will discuss this issue in more detail 
in a subsequent paragraph. 

The EBHIS data are observed using in-band 
frequency switching, using a frequency shift of 
3 MHz. Eventually, in the near future the re- 
ceiving system can be upgraded to support 
the recently dev eloped least -squares frequency 
switching (LSF S; Heilesl 120071 ) method for which 
Winkel fc Kerpl ( 20071 ) proposed further improve- 
ments. LSFS requires changes to the hardware 
(i.e., to provide more than two LO frequencies), 
which were not yet introduced at the 100-m tele- 
scope. However, both methods can fail in the 
case of non-trivial RF gain curves, an issue which 
is discussed at the end of this subsection. Here, 
we make no specific use of the in-band frequency 
switching technique in order to remove gain curve 
effects. 

To calibrate the data, recorded in arbitrary 
units, i.e., spectrometer counts, the product G = 
Gif(^if)Gr,f(^f) has to be determined. This can 
be done using the built-in noise diode which is fed 
into the system at every second spectral dump. 
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Fig. 6. — For two distinct spectral ranges, the results of the RFI detection procedure are visualized. The 
upper panels contain raw spectral density plots as a function of the spectral channel and spectral dump (each 
dump is a 500ms snapshot). The lower panels show the identified RFI candidates as black areas. Note that 
the continuum source (around spectral dump 90) and the spectral line of a galaxy (around spectral channel 
2100, dump 70) were not flagged, as desired. Sharp features of the MW emission (spectral channel ~ 1550), 
however, can sometimes trigger a false detection. The algorithm also works well on broadband events and 
"forests" of narrow band RFI as shown in the bottom right panel. 



Then 




Pw -Pif = G [T A al + T^ + T cal - T A - 


^sysj 


= GT cal 






(5) 


as long as the condition Ta — T A al and T sys 


— -'•sys 



applies for adjacent spectral dumps. Furthermore, 
this approach assumes that the output of the noise 
diode is fed into the very beginning of the sig- 
nal processing chain, which is not exactly true for 
the seven- feed receiver. For this instrument, the 
diodes are embedded in the waveguides connected 
to the feed horns. Hence, the influence of the feeds 
themselves is not taken into account. Further- 
more, the functional form of G can only be recon- 
structed if T cal is frequency independent. Fortu- 
nately this is indeed the case to high precision for 
the 21-cm seven-feed receiver (R. Keller, private 
communication 2009). 

The thermal noise temperature of the calibra- 



tion diode, T cal , is known, such that one can ob- 
tain G = (Pj c F al - Pi F )/T cal . However, a bet- 
ter precision of the absolute flux calibration can 
be reached by using an astronomical calibration 
source. For this purpose, we utilize IAU standard 
calibration sources (usually S 7, because it is cir- 
cumpolar for Effclsberg, sometimes also S8). The 
spect ral line flu x of th ese calibrators is well known 
(JKalberla et al.l |l982j) allowing one to determine 
the gain factor, g = G(wi sr = 0), with an accu- 
racy of 2% typically. Note, that the calibrators are 
not continuum sources, but are well-defined Hi re- 
gions in the Milky Way; hence they only provide 
the gain for vi ST = 0. In Figure[7l values of g for a 
time range of a few weeks are shown. There is a 
slight dependence of time which is expected for a 
typical receiving system. To measure the scatter 
about the long-time behavior a third-order poly- 
nomial was fitted, showing residual rms of about 
2.5%. Practically, g is measured by performing a 
polynomial fit to the calibration source spectra to 
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Fig. 7. — Calculated gain factors, g = G{v\ S t = 0), 
obtained using the standard calibration source S 7 
over a period of about a month. There is a small 
overall drift, which was fitted using a third-order 
polynomial. Remaining scatter is about 2.5% with 
few outliers, which could be flagged as bad mea- 
surements. Usually, the value of the polynomial 
fit is used for the gain calibration. 

separate the spectral line from the baselines and 
calculating 
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e: 2 pif(v, si ) 



s, 



cal 



(6) 



ui,2 are the integrati on limits for the chose n IAU 
position according to Kalberla et al.l (J1982I ) . 

The complete gain curve is easily computed by 
normalizing GT cal such that G = GT cal (w lsr = 
0kms~ ) = 1 and applying the absolute flux cali- 
bration (as obtained using the IAU standard cali- 
bration sources) by multiplying with g: 



G = gG. 



(7) 



In Figure[51 some of the obtained gain curves 
G are shown. They were computed by calculating 
the median of G of all spectral dumps. Usually 
this median gain curve contains only few outliers 
due to RFI signals. For later application of G to 
the spectra the gain curves are smoothed with IGS 
(using a filter kernel width of a — 64 kHz) to re- 
duce residual noise. Panels (a) and (b) contain 
the left-hand polarization channel of the central 
feed for the two different frequency shifts. Several 
ripples are visible which follow the LO shift and, 
hence, can be attributed to the RF part of the 
gain curve. It is possible that these features are 




16384 



Fig. 8. — Gain curves G for different measure- 
ments. Panels (a) and (b) contain the left-hand 
polarization channel of the central feed for the two 
different frequency shifts. Panel (c) shows the gain 
curve of one of the offset feeds of the same data set. 
The noisy results (gray solid lines) were over plot- 
ted with a smoothed curve (black solid lines) as 
obtained after filtering with IGS. In panel (d), G 
(smoothed) is plotted for three different measure- 
ments (central feed) to investigate the long-term 
stability of the gains. The overall shape of G is ob- 
viously rather stable, though there are some fea- 
tures, especially in the center of the band, which 
vary more strongly. 

caused by resonances in the waveguide connect- 
ing the feed horn antenna with the receiver. In 
panel (c), a linear polarization channel of one of 
the offset feeds is shown. 

Figure|H]shows spectra (integrated over one sub- 
scan) for the two different LO frequencies before 
(upper panel) and after (lower panel) calibration. 
Features in the raw spectra do not match exactly. 
This is due to the frequency dependence of G. Af- 
ter applying the gain calibration, both LO setups 
produce nearly identical results (apart from RFI 
and noise contribution). The baseline level of the 
calibrated spectra is by definition equal to the sys- 
tem temperature. 

The spectra shown in the lower panel of Figured 
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Fig. 9. — Raw (top panel) and calibrated (bottom panel) spectra (integrated over one subscan) for the 
two different LO frequencies. The latter were computed by dividing the raw spectra by the calculated gain 
curves G (see Figured]) and multiplying with the absolute flux calibration values g. In the calibrated spectra, 
Ta + T sys is very well matching in both LO phases. The curve for LO 2 was displaced by AT = 5 K for 



better visualization, 
each radial velocity. 



The baseline level of Ta + T sys determines the effective system temperatures valid for 



also reveal multi-modal sine-wave contributions. 
Usually such a pattern is attributed to standing 
waves (SWs) between the primary and secondary 
focus. We tested this hypothesis by performing 
test measurements with the single-feed receiver 
(which is constructed in a similar way as the multi- 
feed). For technical reasons the sub-reflector at 
the 100-m telescope is tilted when observing with 
this instrument. These tests showed a significant 
reduction of the SW amplitudes. Furthermore, for 
the multi-beam receiver two distinct observations 
having a feed rotation angle differing by 180° but 
measured at similar elevation were inspected. One 
could expect to find the same SW contribution but 
only in opposite (linearly polarized) feed horns of 
the hexagonal array; however, this is not the case. 
Therefore, we have to treat the problem in a non- 
analytical way, e.g., by baseline fit ting or using a 
series of sine waves as proposed by iPeek fc Heiles 
(|2008h . 



In order to test the stability of the determined 
gain curves we compared G for three different mea- 
surements (see Figured] (d)). There are minor de- 
viations between the observations. The relative 



change of the gain curve as a function of time dur- 
ing a single measurement is plotted in FigurcflOl 
(top panel). It was computed by dividing the me- 
dian of G of the total measurement by the me- 
dian of G of each subscan. Except for a tempo- 
ral variability in the center of the spectral band, 
the residuals are flat, showing that the gain curve 
is stable. FigurefTOl (lower panel) shows for the 
same measurement residual changes in Ta + T sys 
(again by computing the median for each subscan 
minus the total median). Two effects are obvious. 
First, there is an overall dependence on the eleva- 
tion angle, which is a function of the subscan num- 
ber, causing the continuum level to change. Sec- 
ond, Tsw obviously contains a contribution which 
changes slowly, likely as a function of the feed 
rotation angle and elevation. To test both de- 
pendencies, a measurement without any feed ro- 
tation was examined. Here, SW residuals of al- 
most the same amplitude occur. This seems to 
rule out the feed angle correspondence, but the 
SWs are polarized (see also IrleileslbOOa ) : hence, a 
feed rotation should lead to a net effect. In most 
of our measurements the differential feed rotation 
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Fig. 10. — In order to test the stability of the com- 
puted gain curves, the relative changes of G are vi- 
sualized in the upper panel. It shows that, except 
for a feature in the center of the band, the me- 
dian of G per subscan is equal to the total median 
of G as calculated for the complete measurement. 
The bottom panel contains the relative changes of 
Ta + T ays (computed accordingly) which shows a 
dependency of the continuum level due to changes 
in elevation and a residual multi-modal sine-wave 
pattern which is attributed to rather fast changes 
in T S w 

is rather small such that the SWs are not signif- 
icantly affected. Furthermore, several drift scans 
having constant elevation angles and no feed rota- 
tion were analyzed. In most cases no residual SWs 
show up. Few measurements contain a residual 
pattern, however, not such a clear sine-wave-like 
signal as in the lower panel of Figurc[TU] These 
patterns might be due to the solar irradiation, as 
they seem to occur only during daytime. 

Finally, we discuss a method which enables us 
to measure the gain calibra tion factors for a l l seve n 

" dl982h . 



feeds simultaneously. In iKalberla et al. 



only the flux for the exact positions on the cal- 
ibration sources was determined. In order to cali- 
brate the offset feeds, it would hence be necessary 
to point each of them subsequently to the exact 
location of the calibration region. To efficiently 



use the observing time (about an order of magni- 
tude), we chose to map the area around the stan- 
dard calibration sources S 7 and S 8. From these 
maps we extract reference flux values (integrated 
spectral line fluxes within a certain velocity inter- 
val) to calculate the gain factors for all feeds. In 
this scheme, only the central feed is positioned on 
S 7/S 8 exactly. Note that our calibration method 
uses the integral values of the feed response func- 
tion rather than the peak values. 

4-2.3. Frequency switching techniques 

Using standard in-band frequency switching 
one usually computes the term 



P S1S (f IF) -P TCi (f IF 

P Icf (fw) 



n is (/R F ) + ra (/rf) - T s r y e s f (/ RF )) 



AG / 

ir( : 



T s ^ + T s f s+T ^(f RF ) 



1- 



Tl 



~r -L sys 

(8) 



to get rid of the gain curve ( Heilesl 120071 ). Here, 
the frequency-dependent and independent (contin- 
uum) parts w ere treated separately. However, as 
IHeilesI (|2007l ) points out, the method relies highly 
on the assumption that the relative difference be- 
tween both LO phases, AG/G <gC 1, which is def- 
initely not valid for the multi-beam receiver. Fur- 
thermore, the term T S yf (/rf) — ^ysC/rf) must be 
negligible which is not the case due to the stand- 
ing wave contribution. IHeilesI ( 20051 ) reports about 
similar issues for the Arecibo telescope. Note, that 
the least-squares frequency switching technique is 
not affected by the SW contribution, but would 
still fa il in the pre sence of a non-trivial RF filter 
curve (|Heilesll2007l ). 

The SW problem can also have an impact 
on data observed by position switching when 
r s °y n s(/RF) # T°ff(/ RF ), which can happen if the 
feed rotation or elevation angles change between 
both phases. Furthermore, in the vicinity of mod- 
erate or strong continuum sources, residual effects 
can occur. Also, if the reference spectrum is ob- 
tained by computing the median of a subscan, the 
elevation should not change during the scan line; 
otherwise the SW contribution cannot be consid- 
ered to be constant. 
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4.3. Stray-radiation correction 

In order to compute the true brightness tem- 
perature, the antenna temperature Ta must be 
deconvolved to account for the sidelobe pattern 
of the antenna response function. This so-called 
SR correction, TgR, can be expressed as additive 
term such that 



Vett 



Tsf 



(9) 



with the antenna gain efficiency, rj c g. SR can pro- 
vide a significant fraction to the measured spec- 
trum especially in galactic science where Milky 
Way emission is observed toward all lines of sight. 
For EBHIS we make use of the existing SR code 
developed by iKalberlal (I1978|) which was success - 



fully applied t o the LAB ( Kalbcrl a et al 
and the GASS (JKalberla et al.ll2010l) surveys 



2005) 



4.4. Baseline fitting 

After flux calibration and subtraction of the SR 
contribution remaining baselines, i.e., the stand- 
ing waves, must be removed. Usual polynomial 
baseline fitting is not very well suited for our 
problem, as the SW contribution would need ei- 
ther extremely high polynomial orders or require 
piece-wise fitting of smaller spectral areas lead- 
ing to problems of connecting the overlap regions. 
Therefore, we simply apply our IGS algorithm 
(kernel width of a — 0.1 MHz). For higher accu- 
racy, the software allows us to perform the IGS 
on the median of all spectral dumps of a sub- 
scan, which in most cases leads to better results, 
as the baseline is only slowly changing with time. 
In some cases the broad diffuse part of the Milky 
Way line emission is not flagged to the full width. 
For improvement, we produced an SQL table con- 
taining spectral windows based on the LAB sur- 
vey, which can be used as priors. In Figure fTTl we 
show an example spectrum with the baseline fit 
as computed for the median of all spectral dumps 
within a subscan. 

4.5. Gridding 

A lot of effort went into the development of a 
new gridding software, to allow serial data process- 
ing. This is necessary, as hundreds of gigabytes of 
spectral data have to be gridded together to form 



even relatively small data cubes. Due to the se- 
rial approach only the size of the produced data 
cube puts constraints on the memory usage of the 
software, such that large areas of the sky can be 
processed for the full spectral range in one cycle. 

4-. 5.1. Map calculation 

The contribution of each individual spectral 
dump to the pixels on the grid is calculated using a 
Gaussian of size Ckernei as weighting function. The 
weighting factors for each pixel are accumulated in 
a weight map (or in a weight cube, if flags are to 
be applied). Finally, each pixel in the data cube is 
divided by the associated cumulated weighting fac- 
tor. The gridding method is equivalent to a (spa- 
tial) convolution of the individual spectra with a 
Gaussian kernel. Consequently, the effective an- 
gular resolution is slightly degraded; typically we 
use a kernel width of 5. '4 (FWHM) which enlarges 
the effective telescope beam of 9' to I0.'5. 

The algorithm is independent of the chosen pro- 
jection system (for the pixel grid), as true an- 
gular distances are used for calculations. Ac- 
cordingly it is necessary to convert between true 
coordinates and the pixel representation. For 
such transformations, the gridder uses the world 
coordinate system (WCS) enhancement of the 
CFiT SIO libran5(WCSlib:lGreisen fc Calabretta 
2002[ ICalabretta fc Greisenl 120021 iGreisen et al. 
2006) to allow gridding into many different pro- 
jection systems. 

The software allows us to subtract a low-order 
polynomial from each dump to remove the contin- 
uum flux, which could also be used to calculate 
continuum maps, though the elevation-dependent 
contribution would need some extra treatment. 

4-5.2. Velocity regridding 

The LO frequencies at the 100-m telescope are 
fixed to a constant value at the beginning of each 
subscan (calculated for the central feed). There- 
fore, the spectra have to be regridded in the ve- 
locity/frequency space to account for shifts in the 
local standard of rest frame during a single sub- 
scan, as well as for differences between the central 
and the offset feeds. The spectral regridding is 
performed for each individual dump before grid- 



1 http://heasarc.nasa.gov/docs/software/f it sio/f itsio.html 
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Fig. 11. — To remove the baselines, the IGS algorithm is applied. The upper panel displays a complete 
spectrum, while the lower panels contain a zoom-in, showing that the filtering procedure works well in the 
case of the presence of strong and broad emission lines, as well as in the vicinity of RFI signals and even for 
extended ranges of contaminated data. 



ding to the data cube, using the Akima spline al- 
gorith m provided by the GNU scientific librarjo 
(GSL; iGalassi et alll2009h . 



5. First results 

In Figure fTSl we show as an example some ve- 
locity planes of one of our measurements showing 
the wealth of details which can be found in the 
EBHIS data. We find in this particular field a 
lot of IVC and HVC gas, exhibiting filaments and 
clumps many of which are probably not resolved 
with the Effclsberg telescope beam of 9'. The 5x5 
deg 2 map was observed for a total integration time 
of about 140 minutes leading to an rms noise level 
of 65mK, equivalent to a 4<r column density limit 
of about 3.6 TO 18 cm -2 (calculated for a Gaussian- 
shaped emission line of width 20 km s~ and spec- 
tral data smoothed to lOkms - , i.e., a 4<r detec- 
tion in two adjacent spectral channels). These val- 
ues match the theoretical expectation (due to the 
low elevation angle, the system temperature was 
relatively high with 40 K). For a region of the data 
cube being free of any emission the miriad task 
imhist was used to calculate a noise histogram 
(see FigurelTB"]). It reveals a distribution which 
is almost perfectly described by a Gaussian. Re- 



maining RFI, which would show up as deviations 
from the standard distribution, is not visible. 

To achieve a first full coverage of the northern 
sky expected for Spring 2011, the scan speed was 
increased by a factor of 2. Therefore, we expect 
for the first data release a noise level of < 90 mK 
(or 5- 10 18 cm -2 ) which is very similar to the noise 
level of the GASS survey (when smoothing EBHIS 
data to the GASS beam size). 

Currently, about a third of the northern hemi- 
sphere has been observed so far. The coverage 
is inhomogeneously distributed over many smaller 
test fields (1 to 100 deg 2 ) and two larger coher- 
ent sky portions, each covering about 2000 deg 2 . 
The first large area is toward the northern galac- 
tic pole and the second one toward the northern 
tip of the Magellanic Leading Arm. In Figurc[T2l 
(bottom panel), we show a column density map of 
the latter calculated for the velocity interval be- 
tween — 50 < v\ ST < +50kms . 

To evaluate the EBHIS data in comparison with 
previous surveys, we compare in Figure [T4l EBHIS 
with GASS and a selected area fro m the Cana- 
dian Galactic Plane Survey (CGPS: iTavlor et al 



2 http : //www . gnu . org/sof tware/gsl/ 



2003). The upper panel displays channel maps 
of EBHIS and GASS (vi sr = -46kms _1 ) show- 
ing an intermediate-velocity structure. The con- 
tour levels are the same for both data sets, though 
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Fig. 12. — Top and middle panels show several velocity planes of an example data cube. Lots of filaments 
and clumps are visible which are often unresolved by the telescope beam of 9' (marked by a circle in the 
lower left part of each map). The bottom panel contains a column density map of one of the two larger 
2000deg 2 areas observed so far. TVhi is calculated for the velocity interval between —50 < ui sr < +50kms~ 1 . 
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Fig. 13. — Noise histogram for a line-free part of 
the data cube as calculated using 22 million pix- 
els. The solid line shows a Gaussian fitted to the 
data. The resulting rms is 65 mK which matches 
the theoretical expectation. 



the nominal noise levels are different. Due to 
the higher angular resolution of EBHIS more de- 
tails are visible. FigureQj] (bottom panel) shows 
an overlap region of the CGPS superposed with 
EBHIS contour lines. Despite the different angu- 
lar resolution, EBHIS traces the same structures 
as the CGPS. 

6. Summary 

We presented in this work the current status of 
the EBHIS data reduction procedures. The ob- 
served spectral data contain lots of RFI and ex- 
hibit a rather complicated gain curve shape. A 
lot of effort went into the investigation of these is- 
sues and the development of robust and computa- 
tionally fast algorithms. We discussed the stand- 
ing wave problem and its potential impact on the 
widely used frequency or position switching tech- 
niques. 

The receiver quality was extensively tested. 
System temperatures are as expected and the com- 
plete receiving system has stability times suffi- 
ciently high to reach all scientific aims of the 
EBHIS. Using the standard calibrators S7 or S8, 
a gain calibration accuracy of better than 3% is 
easily obtained. 

The current quality of the EBHIS data and the 
reduction software is promising. We are confident 
that even the first data release in 2011 will provide 



a database in quality comparable to HIPASS and 
GASS. This will also allow us to produce a great 
successor to the LAB survey data by combining 
EBHIS and GASS both having much higher sen- 
sitivity and resolution. 
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